Model Selection

16kHz audio processing

# 16kHz audio processing

Focalcodec 25hz

Low-bitrate speech codec based on focal modulation network, supporting 16 kHz speech encoding

Speech Synthesis

Audio Emotion Detection

This model is fine-tuned from facebook/wav2vec2-large-xlsr-53 for audio emotion detection, capable of recognizing 7 emotional states

Audio Classification

Sentis Whisper Tiny

Whisper-Tiny is a small automatic speech recognition (ASR) model developed by OpenAI, designed for speech-to-text tasks and suitable for Unity environments.

Speech Recognition

Wav2vec2 French Phonemizer

This is a model fine-tuned for the task of French speech to phoneme conversion, based on the facebook/wav2vec2-base-fr-voxpopuli-v2 model and trained using the Common Voice v13 dataset.

Speech Recognition

Transformers French

Wav2vec2 Large Vi Vlsp2020

Vietnamese automatic speech recognition model based on wav2vec2 architecture, pre-trained with 13,000 hours of unlabeled YouTube audio and fine-tuned on 250 hours of labeled data

Speech Recognition

Transformers Other

Wav2vec2 Conformer Rope Large 100h Ft

Wav2Vec2 Conformer model fine-tuned on 100 hours of Librispeech data, incorporating rotary position embedding technology

Speech Recognition

Transformers English

Wav2vec2 Large 10min Lv60 Self

This model is a large-scale speech recognition model based on the Wav2Vec2 architecture, pre-trained and fine-tuned on 10 minutes of data from Libri-Light and Librispeech, using self-training objectives, suitable for 16kHz sampled speech audio.

Speech Recognition

Transformers English

Data2vec Audio Large 10m

Data2Vec is a general self-supervised learning framework applicable to speech, vision, and language tasks. This large audio model is pre-trained and fine-tuned on 10 minutes of Librispeech data, suitable for 16kHz sampled speech audio.

Speech Recognition

Transformers English

Data2vec Audio Large

Data2Vec-Audio-Large is a large model pre-trained on 16kHz sampled speech audio using a self-supervised learning framework, suitable for tasks such as speech recognition.

Speech Recognition

Transformers English

Data2vec Audio Base 100h

Data2Vec is a general self-supervised learning framework applicable to speech, vision, and language tasks. This audio base model was pre-trained and fine-tuned on 100 hours of Librispeech audio data.

Speech Recognition

Transformers English

Hubert Xlarge Ll60k

Hubert is a self-supervised learning-based speech representation model that learns joint acoustic and linguistic representations of speech through BERT-like predictive loss.

Speech Recognition

Transformers English

Wav2vec2 Large Xlsr Turkish

This is an automatic speech recognition model fine-tuned on the Turkish Common Voice dataset based on the facebook/wav2vec2-large-xlsr-53 model, achieving a test WER of 21.13%.

Speech Recognition Other

Wav2vec2 Base Sl Voxpopuli V2

This is a speech model based on Facebook's Wav2Vec2 architecture, specifically pretrained for Slovenian (sl) using 11.3k hours of unlabeled data from the VoxPopuli corpus.

Speech Recognition

Transformers Other

Wav2vec2 Large Xlsr 53 Ukrainian

A Ukrainian automatic speech recognition (ASR) model fine-tuned from facebook/wav2vec2-large-xlsr-53, trained on the Common Voice dataset.

Speech Recognition Other

Wav2vec2 Base Pt Voxpopuli V2

Wav2Vec2 base model pretrained on Portuguese VoxPopuli corpus, suitable for speech recognition tasks

Speech Recognition

Transformers Other

Wav2vec2 Large Slavic Voxpopuli V2

Facebook's Wav2Vec2 large model, pre-trained on 88.99999999999999 hours of unlabeled data from the Slavic language VoxPopuli corpus.

Speech Recognition

Wav2vec2 Large Baltic Voxpopuli V2

Facebook's Wav2Vec2 large model, pre-trained on 27.5 hours of unlabeled data from the Baltic language subset of the VoxPopuli corpus.

Speech Recognition

Wav2vec2 Base Pl Voxpopuli V2

Polish Wav2Vec2 base model trained on VoxPopuli corpus, suitable for speech recognition tasks

Speech Recognition

Transformers Other

An automatic speech recognition model fine-tuned on Greek language based on facebook/wav2vec2-large-xlsr-53

Speech Recognition

Transformers Other

Hubert Base Ls960

HuBERT is a self-supervised speech representation learning model that learns speech features through BERT-like prediction loss, suitable for tasks such as speech recognition.

Speech Recognition

Transformers English

Wav2vec2 Large Xlsr 53 Rm Vallader

A fine-tuned speech recognition model for the Romansh Vallader dialect based on facebook/wav2vec2-large-xlsr-53, achieving a word error rate of 32.89%

Speech Recognition

Wav2vec2 Large Romance Voxpopuli V2

Facebook's Wav2Vec2 large model, pretrained only on 101.5 hours of unlabeled data from the Romance language VoxPopuli corpus, suitable for speech recognition tasks.

Speech Recognition

Wav2vec2 Large Mt Voxpopuli V2

Facebook's Wav2Vec2 large model, pretrained exclusively on unlabeled data from the VoxPopuli corpus for Maltese (mt), suitable for speech recognition tasks.

Speech Recognition

Transformers Other

Indonesian automatic speech recognition (ASR) model fine-tuned on the XLSR architecture, trained on the Common Voice Indonesian dataset

Speech Recognition

Transformers Other

Wav2vec2 Base Sk Voxpopuli V2

Wav2Vec2 base model pretrained on Slovak data from the VoxPopuli corpus, suitable for speech recognition tasks.

Speech Recognition

Transformers Other

Wav2vec2 Base Sv Voxpopuli V2

A speech model based on Facebook's Wav2Vec2 architecture, specifically pre-trained for Swedish using 16.3k hours of unlabeled data from the VoxPopuli corpus.

Speech Recognition

Transformers Other

Wav2vec2 Large Xlsr 53 French

This is a French speech recognition model fine-tuned from the XLSR-53 large model, trained on the Common Voice dataset, supporting high-accuracy French speech-to-text conversion.

Speech Recognition French

Wav2vec2 Large Xlsr Persian

A fine-tuned automatic speech recognition model for Persian (Farsi) based on facebook/wav2vec2-large-xlsr-53, supporting 16kHz sampled audio input.

Speech Recognition Other

Wav2vec2 Large Xlsr Georgian

This is an automatic speech recognition (ASR) model fine-tuned on the Georgian language based on the facebook/wav2vec2-large-xlsr-53 model, trained using the Common Voice dataset.

Speech Recognition Other

Wav2vec2 Base Et Voxpopuli V2

A speech model based on Facebook's Wav2Vec2 framework, specifically pretrained for Estonian

Speech Recognition

Transformers Other

Wav2vec2 Base Fi Voxpopuli V2

A speech model based on Facebook's Wav2Vec2 architecture, specifically pre-trained for Finnish, suitable for speech recognition tasks.

Speech Recognition

Transformers Other

Wav2vec2 Large Xlsr Pt

A Portuguese automatic speech recognition (ASR) model fine-tuned from facebook/wav2vec2-large-xlsr-53, achieving a 17.22% word error rate (WER) on the Common Voice Portuguese dataset

Speech Recognition Other

Wav2vec2 Base Cs Voxpopuli V2

Wav2Vec2 base model pretrained on the VoxPopuli corpus, specialized for Czech speech processing

Speech Recognition

Transformers Other

Wav2vec2 Base En Voxpopuli V2

A Wav2Vec2 base model pre-trained on 24.1k hours of unlabeled English data from the VoxPopuli corpus, suitable for speech recognition tasks.

Speech Recognition

Transformers English

Wav2vec2 Large Xlsr 53 Tatar

A speech recognition model fine-tuned on the Tatar Common Voice dataset based on Facebook's wav2vec2-large-xlsr-53 model

Speech Recognition Other

Wav2vec2 Base De Voxpopuli V2

A German speech pretrained model based on Facebook's Wav2Vec2 architecture, pretrained using 23.2k unlabeled German data from the VoxPopuli corpus.

Speech Recognition

Transformers German

Wav2vec2 Base Nl Voxpopuli V2

A speech model based on Facebook's Wav2Vec2 architecture, specifically pretrained for Dutch using 19.0k unlabeled data from the VoxPopuli corpus.

Speech Recognition

Transformers Other

Romanian Wav2vec2

A Romanian speech recognition model fine-tuned based on facebook/wav2vec2-xls-r-300m, trained on Common Voice 8.0 and Romanian speech synthesis datasets, ranked first in Romanian speech recognition in the HuggingFace Robust Speech Challenge.

Speech Recognition

Transformers Other

Wav2vec2 Large Uralic Voxpopuli V2

Wav2Vec2 large speech model pre-trained on 42.5 hours of unannotated Uralic language data from the VoxPopuli corpus

Speech Recognition

Wav2vec2 Base Lt Voxpopuli V2

This is a speech model based on Facebook's Wav2Vec2 architecture, specifically pretrained for Lithuanian using 14.4k unlabeled data from the VoxPopuli corpus.

Speech Recognition

Transformers Other

Featured Recommended AI Models

AIbase

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご

© 2025AIbase